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Abstract — Focusing on self-service restaurants, food 
recognition algorithms could enable both monitoring of food 
consumption and the automatic billing of the meal grabbed by 
the customer. The latter is quite relevant because remove the 
need for a manual selection of the chosen dishes, allowing to 
speed-up the service offered by these restaurants. Internet users 
are become huge nowadays; in concern with time saving and 
manual billing feature to be avoided we propose an automatic 
billing alert system with globalization of restaurant menu. The 
proposed design uses Android application to enable the user to 
select the menu which is globalized in the IOT screen or android 
database. The selected image of the food recipe is processed 
using image processing. The image processing is done using 
MATLAB. Whereas the food recognition is done using Deep 
Neural Network and bill estimation is transferred to android 
application based display screen in which a pop up message can 
bring out the payable amount. 

Index Terms —Android, IOT, Globalized. 

I. INTRODUCTION 

The image processing is the method used for detecting the 
food in the plate using Deep Neural Network.The image 
processingis the physical process used to convert image signal 
into a physical image. The image signal can be either digital 
nor analog. The actual output itself can be an actual physical 
image or the characteristics of an image. Thus the image of the 
food in the plate is capture through mobile phone or camera. 
The image is preprocessed so that it can be identified what 
kind of food it is. The preprocessing of image is the process of 
converting form RGB value to the HSV image. The HSV 
image undergoes a process know as bounding box. It is used 
to locate the food in the plate by eliminating the rest of the 
place in the plate. It helps to identify the food and its calories. 
After identifying each food in the plate bill estimation is done. 
Each restaurant has its own price and texture for their food. 
Hence database is maintained for each restaurant. The 
database contains their own food images and their price. So 
that it helps to know the price of the each dishes. It helps to 
calculate the amount of the food ordered by the customer. The 
calculated amount is send to the customers mobile as a pop up 
message. Then the payment can be done using mobile app like 
payzapp. The image processing is done using the software 
MATLAB. The MATLAB high-performance language for 
technical computing integrates computation, visualization, 
and programming in an easy-to-use environment where 
problems and solutions are expressed in familiar 
mathematical notation. MATLAB is an interactive system 
whose basic data element is an array that does not require 
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dimensioning. It allows you to solve many technical 
computing problems, especially those with matrix and vector 
formulations, in a fraction of the time it would take to write a 
program in a scalar non interactive language such as C or 
Lortran. 

II. EXISTING SYSTEM 

In the existing system, Semantic Lood Detection, this 
integrates in the same framework food localization, 
recognition and segmentation.lt is applied to thhe problem of 
food tray analysis in self-service restaurants.they integrate 
both techniques, food and non-food semantic segmentation 
with food detection, through the application of two 
procedures: a probabilistic procedure that allow us remove 
the background detections, and a custom non-maximum 
suppression procedure to avoid the occurence of duplicate 
detections. Regarding the architecture, the two pathways are 
used in parallel for food detection and semantic segmentation. 
The purpose of applying this separate computation is to take 
advantage of the benefits of each method separately to later 
combine them. In this manner, they do not condition each 
other, but reinforce themselves. In particular they propose an 
end to end architecture which directly feeds the segmentation 
output into the detection, the segmentation errors could not be 
recovered and, therefore, they could negatively influence the 
detection performance. It significantly outperforms the 
state-of-art in terms of recall and mean average accuracy. 
Lurthermore the model is less sensitive to class imbalance and 
the mean of errors per foods placed on a tray. CNN-based 
models have been able to progressively improve the result of 
food recognition. 

III. LITERATURE SURVEY: 

This paper considers the problem of recipe-oriented 
image-ingredient correlation learning with multi-attributes for 
recipe retrieval and exploration. Existing methods mainly 
focus on food visual information for recognition while we 
model visual information, textual content (e.g., ingredients), 
and attributes (e.g., cuisine and course) together to solve 
extended recipe-oriented problems, such as multimodal 
cuisine classification and attribute enhanced food image 
retrieval. As a solution, we propose a multimodal multitask 
deep belief network (M3TDBN) to learn joint 
image-ingredient representation regularized by different 
attributes. By grouping ingredients into visible ingredients 
(which are visible in the food image, e.g., “chicken” and 
“mushroom”) and non visible ingredients (e.g., “salt” and 
“oil”),M3TDBN is capable of learning both midlevel visual 
representation between images and visible ingredients and 
non visual representation. Furthermore, in order to utilize 
different attributes to improve the inter modality correlation, 
M3TDBN incorporates multitask learning to make different 
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attributes collaborate each other. Based on the proposed 
M3TDBN, we exploit the derived deep features and the 
discovered correlations for three extended novel applications: 
1) multimodal cuisine classification; 2) attribute-augmented 
cross-modal recipe image retrieval; and 3) ingredient and 
attribute inference from food images. The proposed approach 
is evaluated on the constructed Yummly dataset and the 
evaluation results have validated the effectiveness of the 
proposed approach. [1] 

In this work we address the task of semantic image 
segmentation with Deep Learning and make three main 
contributions that are experimentally shown to have 
substantial practical merit. First, we highlight convolution 
with up sampled filters, or atrous convolution as a powerful 
tool in dense prediction tasks. Atrous convolution allows us to 
explicitly control the resolution at which feature responses are 
computed within Deep Convolution Neural Networks. It also 
allows us to effectively enlarge the field of view of filters to 
incorporate larger context without increasing the number of 
parameters or the amount of computation. Second, we 
propose atrous spatial pyramid pooling (ASPP) to robustly 
segment objects at multiple scales. ASPP probes an incoming 
convolution feature layer with filters at multiple sampling 
rates and effective fields-of-views, thus capturing objects as 
well as image context at multiple scales. Third, we improve 
the localization of object boundaries by combining methods 
from DCNNs and probabilistic graphical models. The 
commonly deployed combination of max-pooling and down 
sampling in DCNNs achieves invariance but has a toll on 
localization accuracy. We overcome this by combining the 
responses at the final DCNN layer with a fully connected 
Conditional Random Field (CRF), which is shown both 
qualitatively and quantitatively to improve localization 
performance. Our proposed “Deep Lab” system sets the new 
state-of-art at the PASCAL VOC-2012 semantic image 
segmentation task, reaching 79.7% mlOU in the test set, and 
advances the results on three other datasets: 
PASCAL-Context, PASCAL- Person- Part, and Cityscapes. 
All of our code is made publicly available online. [2] 

Food image recognition is one of the promising applications 
of visual object recognition in computer vision. In this study, 
a small-scale dataset consisting of 5822 images of ten 
categories and a five-layer CNN was constructed to recognize 
these images. The bag-of-features (BoF) model coupled with 
support vector machine was first tested as comparison, 
resulting in an overall accuracy of 56%; while the CNN 
performed much better with an overall accuracy of 74%. Data 
expansion techniques were applied to increase the size of 
training images, which achieved a significantly improved 
accuracy of more than 90% and prevent the over fitting issue 
that occurred to the CNN without using data expansion. 
Further improvement is within reach by collecting more 
images and optimizing the network architecture and relevant 
hyper-parameters. [3] 

We propose a new dataset for the evaluation of food 
recognition algorithms that can be used in dietary monitoring 
applications. Each image depicts a real canteen tray with 
dishes and foods arranged in different ways. Each tray 
contains multiple instances of food classes. The dataset 
contains 1027 canteen trays for a total of 3616 food instances 
belonging to 73 food classes. The food on the tray images has 
been manually segmented using carefully drawn polygonal 
boundaries. We have bench marked the dataset by designing 


an automatic tray analysis pipeline that takes a tray image as 
input, finds the regions of interest, and predicts for each 
region the corresponding food class. We have experimented 
with three different classification strategies using also several 
visual descriptors. We achieve about 79% of food and tray 
recognition accuracy using convolution- neural- networks- 
based features. The dataset, as well as the benchmark 
framework, are available to the research community. [4] 
Convolution networks are powerful visual models that yield 
hierarchies of features. We show that convolution networks 
by themselves, trained end-to-end, pixels-to-pixels, improve 
on the previous best result in semantic segmentation. Our key 
insight is to build “fully convolution” networks that take input 
of arbitrary size and produce correspondingly-sized output 
with efficient inference and learning. We define and detail the 
space of fully convolution networks, explain their application 
to spatially dense prediction tasks, and draw connections to 
prior models. We adapt contemporary classification networks 
(Alex Net, the VGG net, and Google Net) into fully 
convolution networks and transfer their learned 
representations by fine-tuning to the segmentation task. We 
then define a skip architecture that combines semantic 
information from a deep, coarse layer with appearance 
information from a shallow, fine layer to produce accurate 
and detailed segmentations. Our fully convolution network 
achieves improved segmentation of PASCAL VOC (30% 
relative improvement to 67.2% mean IU on 2012), NYUDv2, 
SIFT Flow, and PASCAL-Context, while inference takes one 
tenth of a second for a typical image. [5] 

IV. PROPOSED SYSTEM 

In the proposed system an automatic billing alert system with 
globalization of restaurant menu etc. The proposed design 
uses Android application to enable the user to select the menu 
which is globalized in the IOT screen or android database. 
The selected image of the food recipe is processed using 
image processing MATLAB and bill estimation is transferred 
to android application based display screen in which a pop up 
message can bring out the payable amount. The proposed 
system uses Adaptive Deep network for adjustable as well as 
accurate results. ADN is used here, which attempts to exploit 
the sparsity (complexity) of neuron connections. Memory 
computations are adjustable here. It saves time by paying the 
bill through phone. There is no need to wait for the bill after 
taking the food. We can know the amount as soon as all the 
ordered food reaches the table. It is implemented as an 
android app. While identifing the dishes it also calculate the 
calories value which is useful for the customers who is on diet, 
sugar patient and also for those who cares for their health 
condition. 

V. ARCHITECTURAL DIAGRAM AND 
EXPLANATION: 



Capture Food 
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FIG 1 PROPOSED ARCHITECTURE 

The dish ordered by the client reaches the table which is 
captured as a photo by the camera attached above facing the 
table.The captured image is then send to the back end 
processing. In the Database the images of whole menu of the 
restaurant is stored which is then compared to the captured 
image.so that the machine learning technique is used to train 
the system to kow what food is that.Then the amount of the 
food ordered by the client is calculated and then send to the 
mobile phone of the client which is payed by the customer 
using the mobile banking or mobile app.In the preprocessing 
the process involve is image processing where the image is 
convered to grey scale using the conversion and the binary 
thresholding is done to eliminate the unwanted area in the tray 
and plate to plot the food alone.ROI approximation is the 
process of finding what food from the image.Contrast 
adjustment is the process of adjusting the color of the image 
so that it suits the image in the database.Using those 
preprocessing technique the feature is extracted.Food is given 
as input so the preprocessing techniques are grey conversion, 
binary thresholding, ROI approximation, Contrast adjustment 
is done. In the the database the preprocessing techniques is 
same as food input preprocessing.From both the input that is 
food input and database input the feature is extracted and 
enters into the post processing. The post processing includes 
the machine learning technique, the algorithm used is deep 
neural network and prediction of the image which is 
compared with the IOT cloud images. 

MODULES: 

> Converting RGB to HSV image 

> Bounding Box detection 

> Identification of food 

> Billing process 

> Payment process 

VI. MODULE DESCRIPTION: 

A. CONVERTING RGB TO HSV IMAGE: 

The input RGB image is resized to an height of 320 pixels. 
The resized image undergoes two separate processing 
pipelines: a saturation-based one, and a color texture one. In 
the first one, the image is firstly gamma corrected and then the 
RGB values are converted to HSV to extract the saturation 
channel. These values are automatically threshold and 
morphological operations are applied to clean up the obtained 
binary image, a second processing based on the segmentation 
algorithm that works on both color and texture features. 


B. BOUNDING BOX DETECTION: 

The segmented image is then processed in order to remove 
non relevant regions. For instance, the regions that touch the 
border of the image do not belong to the food regions and thus 
can be eliminated. Also, regions larger or smaller than 
predefined thresholds can be discarded as well (e.g. the 
placemat, the tray, highlights). The final segmented image 
contains with high probability the food regions and few non 
relevant ones. To further ensure that only few, relevant, 
regions are retained for the classification phase, geometric 
constraints are used to clean up the output of the combining 
step. The bounding boxes of all the regions of interest are 
passed to the prediction phase. 

C. IDENTIFICATION OF FOOD: 

This module consists of neural network block with adaptive 
learning scheme for analyzing the food images. Then the 
images are compared with the database images and identify 
the food item and know the cost of the particular food. Such a 
way that all the food item in a plate is identified. Each image is 
displayed with its name and calories of that particular food is 
also displayed. The calories are known by tabulating it in the 
database. It helps the customer who is on diet whether the 
particular food can be taken or not. 

D. BILLING PROCESS: 

This module consist of calculating the amount of the ordered 
food by the client. Each food price is known, then the full 
amount of the food placed in the tray is calculated. 

E. PAYMENT PROCESS: 

This module consist of sending the calculated amount to the 
client mobile as a pop up message.Then the payment is done 
through the mobile app such as Google Pay, PazApp etc., or 
through the mobile banking.lt helps in decreasing the manual 
process and also the time. 

VII. SCREENSHOTS: 

A. INPUT IMAGE: 



B. HSV IMAGE: 



Food 

Database 
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C. BOUNDING BOX DETECTION 



D. OUTPUT IMAGE: 



E. BILE POPUP MESSAGE: 
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VIII. CONCLUSION: 

We present a novel system that performs Semantic Food 
Detection applied to the problem of food tray analysis in 
self-service restaurants. More precisely, we integrate both 
techniques, food/non-food semantic segmentation with food 
detection, through the application of two procedures: 
probabilistic procedures that allow us remove the background 
detections and a custom non-maximum suppression 
procedure to avoid the occurrence of duplicate detections. 
The segmented image is then processed in order to remove 
non relevant regions. For instance, the regions that touch the 
border of the image do not belong to the food regions and thus 
can be eliminated. The final segmented image contains with 
high probability the food regions and few non relevant ones. 
To further ensure that only few, relevant, regions are retained 
for the classification phase, geometric constraints are used to 
clean up the output of the combining step. 


FUTURE ENHANCEMENT: 

In this paper the process of recognition and calculating the bill 
amount has been done which can be further processed by 
calculating the discount value based on estimating the time 
between the ordering and delivering the food.and also finding 
the combo food and its discounted amount. 
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